issue/194: Support Quantization Config and Quanted Model Inference by qinyiqun · Pull Request #195 · InfiniTensor/InfiniLM

qinyiqun · 2026-01-21T08:50:02Z

为linear类增加量化选项
引入nlohmann json库
增加quantization和global config两个大类，以支持多种advanced feature config。

csrc/config/model_config.hpp

PanZezhong1725 · 2026-01-21T10:19:32Z

csrc/engine/rank_worker.cpp


            // Create model using factory (may be expensive)
-            model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr);
+            model_ = InfinilmModelFactory::createModel(model_config_, rank_info_, pending_cache_config_ != nullptr ? pending_cache_config_.get() : nullptr, global_config_);


为什么又有model config又有global config？

model config 是原有的llama_config，global config现在只负责advanced feature

用通用的json吧原有llama config替换掉

PanZezhong1725 · 2026-01-21T10:23:50Z

csrc/layers/fused_linear.cpp

 }

+infinicore::nn::Parameter QKVParallelLinear::get_q_weight_scale() const {
+    return infinicore::nn::Parameter(


建议返回optional

感觉不需要，类似于bias，bias是由一个has_bias控制，这个get_xx_scale()是在宏里使用的，而宏在代码里面我写的是跟量化方法绑定的，即使返回optional，在宏里也要解optional。

python/infinilm/models/llama/configuration_llama.py

PanZezhong1725 · 2026-01-27T00:48:54Z

csrc/config/global_config.hpp

+#include <fstream>
+#include <string>
+
+namespace infinilm::config::global_config {


应该不需要额外的global_config这个空间

PanZezhong1725 · 2026-01-27T00:49:43Z

csrc/config/global_config.hpp

+#include <string>
+
+namespace infinilm::config::global_config {
+struct GlobalConfig {


直接用class吧。另外建议改名为ModelConfig之类的更直观的名字

我本来想的是可以把distributed config和kv cache config都包进来，所以叫global_config

PanZezhong1725 · 2026-01-27T00:51:27Z

csrc/config/quant_config.hpp

+#include "../quantization/quantization.hpp"
+#include "nlohmann/json.hpp"
+
+namespace infinilm::config::quantization {


同样不需要quantization这个space。这些config应该不会有重名的情况

PanZezhong1725 · 2026-01-27T00:56:19Z

csrc/quantization/base_quantization.hpp

+#include "nlohmann/json.hpp"
+
+namespace infinilm::quantization {
+class BaseQuantization {


这层封装的意义是什么，看着好像只是传了个quant scheme，但这个功能不是QuantConfig就能做吗

现在传config是因为逻辑太少了，为之后开发预留的类，现在有一个需求是模型级别的量化，需要在量化方法之上进行一个封装

whjthu · 2026-01-28T01:38:31Z

csrc/layers/fused_linear.hpp

+// ========================= QKV Quantization ==================================
+#define INFINILM_QKV_LINEAR_W8A8_INIT(name, q_name, k_name, v_name, ...)                            \
+    name##_ = std::make_shared<layers::QKVParallelLinear>(__VA_ARGS__);                             \
+    /* 注册 Q 权重 */                                                                               \


注释不要用中文

whjthu · 2026-01-28T01:39:33Z

csrc/models/model_factory.cpp

    std::shared_ptr<InfinilmModel> model;
-    if (const auto llama_config_ptr = dynamic_cast<const models::llama::LlamaConfig *>(&config)) {
-        const auto &llama_config = *llama_config_ptr;
+    //****************************NEED TO BE FIXED */


这个注释格式太奇怪了

whjthu · 2026-01-28T01:46:27Z

csrc/engine/infer_engine.cpp

 //------------------------------------------------------
 InferEngine::InferEngine(
-    const InfinilmModel::Config &config,
    const distributed::DistConfig &distributed_config,


为什么会涉及这个接口的修改，而且还做了参数顺序调整，这种基础接口的修改属于高风险修改，确定不会有些地方炸掉？

qinyiqun requested a review from a team January 21, 2026 08:50

qinyiqun linked an issue Jan 21, 2026 that may be closed by this pull request

[DEV] 量化功能添加 #194

Open

qinyiqun requested review from PanZezhong1725, pengcheng888 and whjthu January 21, 2026 09:26

PanZezhong1725 requested changes Jan 21, 2026

View reviewed changes

qinyiqun added 8 commits January 23, 2026 10:27

支持nv w8 1batch 1tp

01a7396

增加json支持

9e27583

InfiniLM 增加量化层和global config

bc07f04

以一种比较优雅的方式增加了quant config的支持

1d8f51b

修改部分代码结构，删除无用代码

6898b48

跟随inifnicore修改

23c2ae3

删除所有的model_config，统一使用global_config

ca87660

跟随InfiniLM最新代码修改

6615743

qinyiqun force-pushed the dev branch 3 times, most recently from 85c2485 to 9dee06a Compare January 26, 2026 06:53

修改函数参数顺序

2fcc288

qinyiqun force-pushed the dev branch from 9dee06a to 2fcc288 Compare January 26, 2026 06:55

PanZezhong1725 reviewed Jan 27, 2026

View reviewed changes

改名global config 为model config

2fdb0ff

qinyiqun requested a review from PanZezhong1725 January 27, 2026 03:18

whjthu requested changes Jan 28, 2026

View reviewed changes

Conversation

qinyiqun commented Jan 21, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinyiqun Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qinyiqun Jan 27, 2026 •

edited

Loading